Extending and Improving Wordnet via Unsupervised Word Embeddings

نویسندگان

Mikhail Khodak

Andrej Risteski

Christiane Fellbaum

Sanjeev Arora

چکیده

This work presents an unsupervised approach for improving WordNet that builds upon recent advances in document and sense representation via distributional semantics. We apply our methods to construct Wordnets in French and Russian, languages which both lack good manual constructions.1 These are evaluated on two new 600-word test sets for word-to-synset matching and found to improve greatly upon synset recall, outperforming the best automated Wordnets in F-score. Our methods require very few linguistic resources, thus being applicable for Wordnet construction in low-resources languages, and may further be applied to sense clustering and other Wordnet improvements.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Distributed Representation of Word Sense via WordNet Gloss Composition and Context Clustering

In recent years, there has been an increasing interest in learning a distributed representation of word sense. Traditional context clustering based models usually require careful tuning of model parameters, and typically perform worse on infrequent word senses. This paper presents a novel approach which addresses these limitations by first initializing the word sense embeddings through learning...

متن کامل

Using Word Embeddings for Bilingual Unsupervised WSD

Unsupervised Word Sense Disambiguation (WSD) is one of the challenging problems in natural language processing. Recently, an unsupervised bilingual WSD approach has been proposed. This approach uses context aware EM formulation for estimating the sense distribution by using the co-occurrence counts of cross-linked words in comparable corpora. WordNetbased similarity measures are used for approx...

متن کامل

AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes

We present AutoExtend, a system to learn embeddings for synsets and lexemes. It is flexible in that it can take any word embeddings as input and does not need an additional training corpus. The synset/lexeme embeddings obtained live in the same vector space as the word embeddings. A sparse tensor formalization guarantees efficiency and parallelizability. We use WordNet as a lexical resource, bu...

متن کامل

Supervised and Unsupervised Word Sense Disambiguation on Word Embedding Vectors of Unambigous Synonyms

This paper compares two approaches to word sense disambiguation using word embeddings trained on unambiguous synonyms. The first one is an unsupervised method based on computing log probability from sequences of word embedding vectors, taking into account ambiguous word senses and guessing correct sense from context. The second method is supervised. We use a multilayer neural network model to l...

متن کامل

An Iterative Approach for Unsupervised Most Frequent Sense Detection using WordNet and Word Embeddings

Given a word, what is the most frequent sense in which it occurs in a given corpus? Most Frequent Sense (MFS) is a strong baseline for unsupervised word sense disambiguation. If we have large amounts of sense-annotated corpora, MFS can be trivially created. However, senseannotated corpora are a rarity. In this paper, we propose a method which can compute MFS from raw corpora. Our approach itera...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1705.00217 شماره

صفحات -

تاریخ انتشار 2017

Extending and Improving Wordnet via Unsupervised Word Embeddings

نویسندگان

چکیده

منابع مشابه

Improving Distributed Representation of Word Sense via WordNet Gloss Composition and Context Clustering

Using Word Embeddings for Bilingual Unsupervised WSD

AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes

Supervised and Unsupervised Word Sense Disambiguation on Word Embedding Vectors of Unambigous Synonyms

An Iterative Approach for Unsupervised Most Frequent Sense Detection using WordNet and Word Embeddings

عنوان ژورنال:

اشتراک گذاری